Skip to content

fix race in JoinFuture::poll when waker changes between notified check and waiters list update#104

Open
morgangallant wants to merge 1 commit intoarthurprs:masterfrom
morgangallant:master
Open

fix race in JoinFuture::poll when waker changes between notified check and waiters list update#104
morgangallant wants to merge 1 commit intoarthurprs:masterfrom
morgangallant:master

Conversation

@morgangallant
Copy link

Hi there — thanks for making such a lovely crate! Ran into an issue upgrading from 0.6.2 -> 0.6.18, spent a bit of time trying to get to the bottom of this. Specifically, managed to hit this unchecked_unreachable.

I think the issue is: When a pending JoinFuture is re-polled with a different waker, poll() checks notified without holding the state lock, then acquires the lock to update its entry in the waiters list. A concurrent insert can drain the waiters list between these two operations, leaving the find to fail and hit the unreachable.

Had Claude put together a reproduction script, which consistently hits this unreachable without this fix:

struct TestWaker;
impl Wake for TestWaker {
    fn wake(self: Arc<Self>) {}
}

#[test]
fn test_join_future_poll_race() {
    let cache = Cache::<u64, u64>::new(100);

    for _ in 0..2_000_000 {
        cache.remove(&0);

        let GuardResult::Guard(guard) = cache.get_value_or_guard(&0, None) else {
            continue;
        };

        let key: u64 = 0;
        let mut fut = Box::pin(cache.get_value_or_guard_async(&key));

        // First poll: JoinFuture registers itself as a waiter with waker W1.
        let w1 = Waker::from(Arc::new(TestWaker));
        assert!(fut.as_mut().poll(&mut Context::from_waker(&w1)).is_pending());

        // Race: insert on a scoped thread while we re-poll with a different
        // waker. The waker change forces JoinFuture into the update path that
        // checks `notified` outside the lock.
        std::thread::scope(|s| {
            s.spawn(|| {
                let _ = guard.insert(42);
            });

            let w2 = Waker::from(Arc::new(TestWaker));
            let _ = fut.as_mut().poll(&mut Context::from_waker(&w2));
        });
    }
}

(Apologies if I'm off the mark here, would love any input + help here!)

Signed-off-by: Morgan Gallant <morgan@morgangallant.com>
Copy link
Owner

@arthurprs arthurprs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I will probably try to augment the shuttle test to catch this kind of problem later. If you're up for it, feel free to include it in the PR.

let new_waker = cx.waker();
if !waker.will_wake(new_waker) {
let mut state = shared.state.write();
// Re-check notified after acquiring the lock. A concurrent
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you try merging this match arm with the one above, maybe

JoinFutureState::Pending { waker, shared } => loop {
    if this.notified.load(Ordering::Acquire) {
       ..
    }
    ..
    let state = shared.state.write();
    if this.notified.load(Ordering::Acquire) {
        continue;
    }
    ...
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants